Current Issue : January-March Volume : 2025 Issue Number : 1 Articles : 5 Articles
When people tell lies, they often exhibit tension and emotional fluctuations, reflecting a complex psychological state. However, the scarcity of labeled data in datasets and the complexity of deception information pose significant challenges in extracting effective lie features, which severely restrict the accuracy of lie detection systems. To address this, this paper proposes a semi-supervised lie detection algorithm based on integrating multiple speech emotional features. Firstly, Long Short-Term Memory (LSTM) and Auto Encoder (AE) network process log Mel spectrogram features and acoustic statistical features, respectively, to capture the contextual links between similar features. Secondly, the joint attention model is used to learn the complementary relationship among different features to obtain feature representations with richer details. Lastly, the model combines the unsupervised loss Local Maximum Mean Discrepancy (LMMD) and supervised loss Jefferys multi-loss optimization to enhance the classification performance. Experimental results show that the algorithm proposed in this paper achieves better performance....
As the key hardware of a brain-like chip based on a spiking neuron network (SNN), memristor has attracted more attention due to its similarity with biological neurons and synapses to deal with the audio signal. However, designing stable artificial neurons and synapse devices with a controllable switching pathway to form a hardware network is a challenge. For the first time, we report that artificial neurons and synapses based on multilayered HfOx/TiOy memristor crossbar arrays can be used for the SNN training of audio signals, which display the tunable threshold switching and memory switching characteristics. It is found that tunable volatile and nonvolatile switching from the multilayered HfOx/TiOy memristor is induced by the size-controlled atomic oxygen vacancy pathway, which depends on the atomic sublayer in the multilayered structure. The successful emulation of the biological neuron’s integrate-and-fire function can be achieved through the utilization of the tunable threshold switching characteristic. Based on the stable performance of the multilayered HfOx/TiOy neuron and synapse, we constructed a hardware SNN architecture for processing audio signals, which provides a base for the recognition of audio signals through the function of integration and firing. Our design of an atomic conductive pathway by using a multilayered TiOy/HfOx memristor supplies a new method for the construction of an artificial neuron and synapse in the same matrix, which can reduce the cost of integration in an AI chip. The implementation of synaptic functionalities by the hardware of SNNs paves the way for novel neuromorphic computing paradigms in the AI era....
Artificial intelligence and Internet of Things are playing an increasingly important role in monitoring beehives. In this paper, we propose a method for automatic recognition of honeybee type by analyzing the sound generated by worker bees and drone bees during their flight close to an entrance to a beehive. We conducted a wide comparative study to determine the most effective preprocessing of audio signals for the detection problem. We compared the results for several different methods for signal representation in the frequency domain, including mel-frequency cepstral coefficients (MFCCs), gammatone cepstral coefficients (GTCCs), the multiple signal classification method (MUSIC) and parametric estimation of power spectral density (PSD) by the Burg algorithm. The coefficients serve as inputs for an autoencoder neural network to discriminate drone bees from worker bees. The classification is based on the reconstruction error of the signal representations produced by the autoencoder. We propose a novel approach to class separation by the autoencoder neural network with various thresholds between decision areas, including the maximum likelihood threshold for the reconstruction error. By classifying real-life signals, we demonstrated that it is possible to differentiate drone bees and worker bees based solely on audio signals. The attained level of detection accuracy enables the creation of an efficient automatic system for beekeepers....
Revitalizing listening skills with advanced sound processing technology is an interesting concept, especially in today's digital era where technology is advancing rapidly. Listening skill is the ability to understand, analyze, and respond well to information conveyed through sound, and this is a key skill in various contexts, such as interpersonal communication, business, education, and others. This includes using speech processing algorithms, speech recognition techniques, and even artificial intelligence (AI) to optimize speech understanding and interpretation. Hearing Skills Training Speech processing technology can be used in hearing skills training. For example, an application that can record, analyze and provide feedback on a person's hearing ability. Such applications can be used in the context of business education or training. Automatic Transcription and Translation, Voice processing technology can be used to convert conversations in foreign languages into text or translate voice content in real time. This can help someone understand a foreign language or content displayed in a language they are not familiar with. Improved Business Communications, this can help improve business communication skills. Providing Better Customer Service Many companies use voice processing technology to improve their customer service. Advanced voice interactive systems (IVR) can be used to route customer calls more efficiently and provide necessary information. Sentiment Understanding Advanced voice processing technology can be used to analyze sentiment in conversations or customer reviews....
In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained using forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages....
Loading....